<<<<<<< HEAD ======= >>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Exploration of metadata

library("tidyverse")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
<<<<<<< HEAD
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
=======
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("here")
<<<<<<< HEAD
here() starts at /net/pupil1/home/people/s233426/22160/r_for_bio_data_science/projects/group_09_project
=======
here() starts at /home/marco/group_09_project
>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270
library("ggridges")
library("patchwork")
library("viridis")
Loading required package: viridisLite
library("table1")

Attaching package: 'table1'

The following objects are masked from 'package:base':

    units, units<-
source("99_proj_func.R")
data_path <- here("data/06_dat_augmented.RData")
load(data_path)

First, we want to get an overview of the metadata present in the dataset, before we analyze it.

In the table below, we find parameter values distributed based on glucose tolerance. We see here that there is only one underweight person; therefore, when we look at how values are distributed according to BMI class, we choose not to include it as it does not provide enough data information.

dataset |> 
  table1::table1(x = formula(~ BMI_class + HOMA_category + cholesterol + statins + insulin| glucose_tolerance),
         data = _)
normal
(N=43)
impaired
(N=49)
t2d
(N=53)
Overall
(N=145)
BMI_class
Underweight 1 (2.3%) 0 (0%) 0 (0%) 1 (0.7%)
Normal weight 18 (41.9%) 17 (34.7%) 13 (24.5%) 48 (33.1%)
Overweight 17 (39.5%) 23 (46.9%) 23 (43.4%) 63 (43.4%)
Obese 7 (16.3%) 9 (18.4%) 17 (32.1%) 33 (22.8%)
HOMA_category
Healthy 19 (44.2%) 7 (14.3%) 7 (13.2%) 33 (22.8%)
At risk 15 (34.9%) 26 (53.1%) 8 (15.1%) 49 (33.8%)
Insulin Resistant 8 (18.6%) 14 (28.6%) 20 (37.7%) 42 (29.0%)
Severely Insulin Resistant 0 (0%) 2 (4.1%) 18 (34.0%) 20 (13.8%)
Missing 1 (2.3%) 0 (0%) 0 (0%) 1 (0.7%)
cholesterol
Mean (SD) 5.74 (0.847) 5.60 (0.910) 5.08 (1.07) 5.45 (0.990)
Median [Min, Max] 5.74 [3.53, 7.62] 5.49 [4.04, 7.46] 4.97 [3.42, 8.71] 5.41 [3.42, 8.71]
statins
n 33 (76.7%) 33 (67.3%) 27 (50.9%) 93 (64.1%)
y 10 (23.3%) 16 (32.7%) 26 (49.1%) 52 (35.9%)
insulin
n 43 (100%) 49 (100%) 47 (88.7%) 139 (95.9%)
y 0 (0%) 0 (0%) 6 (11.3%) 6 (4.1%)

Our dataset contains samples taken from patients either having type II diabetes, an impaired tolerance to glucose or being healthy individuals.

We have access to many information regarding markers that correlate to the insurgence of diabetes, but let’s start with the basics.

General visualization

distribution_by_country <- dataset |> 
  ggplot(aes(x = fct_infreq(country))) + 
  geom_bar(fill = "skyblue", color = "black") + 
  geom_text(stat = "count", aes(label = after_stat(count)), vjust = -0.5, size = 3) +
  labs(
    title = "Distribution of samples by country",
    x = "Country",
    y = "Number of Samples"
  ) +
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

save_plot_custom(
  plot = distribution_by_country,
  filename = "07_distribution_of_sample_by_countries.jpg"
)
distribution_by_country
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270
distribution_by_age <- dataset |> 
  ggplot(aes(x = age)) + 
  geom_histogram(fill = "skyblue", color = "black", binwidth = 0.5) + 
  geom_vline(
    aes(xintercept = mean(age, na.rm = TRUE)),
    color = "black", 
    linetype = "dashed", 
    linewidth = 0.5
  ) +
  labs(
    title = "Distribution of Samples by Age",
    x = "Age",
    y = "Number of Samples"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, hjust = 0.5)
  )

save_plot_custom(
  plot = distribution_by_age,
  filename = "07_distribution_of_sample_by_age.jpg"
)
distribution_by_age
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We can say that our samples are collected from females living in central european / nordic countries, mainly Sweden, in the age range 69-72. This reflects what we expect from the paper.

Diabetes specific visualization

Now we move on to more diabetes-specific metadata.

distribution_by_glucose_tolerance <- dataset |>
  count(glucose_tolerance) |>
  mutate(percentage = n / sum(n) * 100) |>
  ggplot(aes(x = glucose_tolerance,
             y = percentage,
             fill = glucose_tolerance,
             label = str_c(round(percentage,digits = 2),"%"))) +
  geom_col(colour = "black",
           alpha = 0.6) +
  theme_minimal(base_size = 15) +
  labs(x = "",
       y = "",
       title = "Sample distribution in glucose tolerance groups") +
  geom_hline(yintercept = 0) +
  geom_text(vjust = -0.5, size = 5)+
  ylim(0,45)+
  theme(legend.position = "none",
        axis.text.y = element_blank())

save_plot_custom(
  plot = distribution_by_glucose_tolerance,
  filename = "07_distribution_of_sample_by_glucose_tolerance.jpg"
)
distribution_by_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Firstly, we observe how the samples are distributed across the glucose tolerance classes. They are almost equally distributed, this allows us to say that the future visualization have enough information to be significant.

distribution_by_BMI <- dataset |>
  count(BMI_class) |>
  mutate(percentage = n / sum(n) * 100) |>
  ggplot(aes(x = BMI_class,
             y = percentage,
             fill = BMI_class,
             label = str_c(round(percentage,digits = 2),"%"))) +
  geom_col(colour = "black",
           alpha = 0.6) +
  theme_minimal(base_size = 15) +
  labs(x = "",
       y = "",
       title = "Distribution of samples into BMI categories") +
  geom_hline(yintercept = 0) +
  geom_text(vjust = -0.5, size = 5)+
  ylim(0,45)+
  theme(legend.position = "none",
        axis.text.y = element_blank(),
        plot.title = element_text(hjust = 0.5))

save_plot_custom(
  plot = distribution_by_BMI,
  filename = "07_distribution_of_sample_by_BMI.jpg"
  )
distribution_by_BMI
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Then, we look at the distribution of BMI classes. We can see that the dataset is significantly enriched in overweight and obese subjects, this make sense as a higher BMI is known to increase the risk of developing diabetes.

bmi_vs_glucose_tolerance <- dataset |> 
  mutate(bmi_group = cut(
    x = bmi,
    breaks = seq(from = 18,
                 to = 45,
                 by = 3))) |> 
  count(glucose_tolerance, bmi_group) |> 
  ggplot(aes(x = bmi_group,
             y = n,
             fill = glucose_tolerance)) +
  geom_col(position = position_dodge(
    preserve = "single"),
    colour = "black",
    alpha = 0.4) +
  geom_hline(yintercept = 0) +
  theme_minimal(base_size = 10) +
  labs(x = "BMI",
       y = "Count",
       title = "BMI count by glucose tolerance",
       fill = "Glucose tolerance: ") +
  theme(legend.position = "bottom",
        panel.grid.major.x = element_blank(),
        axis.text.x = element_text(vjust = 5))

save_plot_custom(
  plot = bmi_vs_glucose_tolerance,
  filename = "07_bmi_vs_glucose_tolerance.jpg"
)
bmi_vs_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

BMI is an estimate of the amount of fat carried by a person, it is usefull when looking at diabetic people as obesity significantly increases the risk of developing diabetes (reference).

stacked_bmi_vs_glucose_tolerance <- dataset |> 
  ggplot(aes(x = BMI_class, fill = glucose_tolerance)) + 
  geom_bar(color = "black") + 
  labs(
    x = "BMI",
    y = "Number of Samples"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, hjust = 0.5)
  )

save_plot_custom(
  plot = stacked_bmi_vs_glucose_tolerance,
  filename = "07_stacked_bmi_vs_glucose_tolerance.jpg"
)
stacked_bmi_vs_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Visualization of different parameters vs glucose tolerance

Here we will focus just on visualizing the relationship between glucose tolerance and many other variables. The statistical significance of these results will be analysed in the analysis part of the project.

Markers for both type I and II diabetes

ridges_hba1c_vs_glucose_tolerance <- dataset |> 
  ggplot(mapping = aes(x = hba1c,
                       y = glucose_tolerance,
                       fill = glucose_tolerance )) +
    geom_density_ridges(alpha = 0.5) +
    labs(x = "HbA1c [mmol/mol]",
         y = "glucose tolerance",
         title = "HbA1c and glucose tolerance") +
    theme_minimal(base_family = "Avenir",
                  base_size = 12) +
    theme(legend.position = "none")

save_plot_custom(
  plot = ridges_hba1c_vs_glucose_tolerance,
  filename = "07_hba1c_vs_glucose_tolerance.jpg"
)
Picking joint bandwidth of 1.84
ridges_hba1c_vs_glucose_tolerance
Picking joint bandwidth of 1.84
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Hemoglobin A1C is hemoglobin bound to glucose, which can be used as a measure of average blood glucose level. High glucose is one of the hallmarks of diabetes. We can see the hba1c getting higher for diabetic patients (reference).

hba1c_48_threshold <- dataset |>
  group_by(glucose_tolerance) |> 
  count(hba1c > 48) |>
  mutate(percentage = n / sum(n) * 100) |>
  ggplot(aes(x = glucose_tolerance,
             y = percentage,
             fill = glucose_tolerance,
             label = str_c(round(percentage,
                                 digits = 2),
                           "%"))) +
  geom_col(colour = "black",
           alpha = 0.4) +
  theme_minimal(base_size = 15) +
  theme(axis.text.x = element_text(
    angle = 45,
    hjust = 1)) +
  labs(x = "",
       y = "%",
       title = "% of people with HbA1c > 48 by glucose tolerance") +
  geom_hline(yintercept = 0) +
  geom_text(vjust = -0.5, size = 5) +
  ylim(0,45) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none" )

save_plot_custom(
  plot = hba1c_48_threshold,
  filename = "07_hba1c_48_threshold.jpg"
)
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_col()`).
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_text()`).
hba1c_48_threshold
Warning: Removed 3 rows containing missing values or values outside the scale range
(`geom_col()`).
Removed 3 rows containing missing values or values outside the scale range
(`geom_text()`).
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

The healthy threshold for hba1c has been identify at 48 mmol/mol (reference).

ridges_wc_vs_glucose_tolerance <- dataset |> 
  ggplot(mapping = aes(x = wc,
                      y = glucose_tolerance,
                      fill = glucose_tolerance )) +
    geom_density_ridges(alpha = 0.5) +
    labs(x = "Waist circumference [cm]",
         y = "",
         title = "Glucose Tolerance vs Waist Circumfrence") +
  theme_minimal(base_family = "Avenir",
                  base_size = 12) +
    theme(legend.position = "none")

save_plot_custom(
  plot = ridges_wc_vs_glucose_tolerance,
  filename = "07_ridges_wc_vs_glucose_tolerance.jpg"
)
Picking joint bandwidth of 3.76
ridges_wc_vs_glucose_tolerance
Picking joint bandwidth of 3.76
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We saw that the WC parameter was a statistically significant parameter in predicting type 2 diabetes. In the plot below, we observe that the data from the type 2 diabetes group is skewed to the right.

ridges_hdl_vs_glucose_tolerance <- dataset |> 
  ggplot(mapping = aes(x = hdl,
                      y = glucose_tolerance,
                      fill = glucose_tolerance )) +
    geom_density_ridges(alpha = 0.5) +
    labs(x = "HDL [mmol/L]",
         y = "Glucose Tolerance",
         title = "HDL and glucose tolerance ") +
    theme_minimal(base_family = "Avenir",
                  base_size = 12) +
    theme(legend.position = "none")

save_plot_custom(
  plot = ridges_hdl_vs_glucose_tolerance,
  filename = "07_ridges_hdl_vs_glucose_tolerance.jpg"
)
Picking joint bandwidth of 0.202
ridges_hdl_vs_glucose_tolerance
Picking joint bandwidth of 0.202
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We investigate how the HDL distribution differs between the glucose tolerance groups, where we see that for the type 2 diabetic group, the distribution is slightly left-skewed (reference).

Markers for discriminating type I and type II

fasting_insuling_vs_glucose_tolerance_bar <- dataset |> 
  mutate(fasting_insulin_group = cut(
    x = fasting_insulin,
    breaks = seq(from = 0,
                 to = 70,
                 by = 5))) |> 
  count(glucose_tolerance, fasting_insulin_group) |> 
  ggplot(aes(x = fasting_insulin_group,
             y = n,
             fill = glucose_tolerance)) +
  geom_col(position = position_dodge(
    preserve = "single"),
    colour = "black",
    alpha = 0.4) +
  geom_hline(yintercept = 0) +
  theme_minimal(base_size = 10) +
  labs(x = "Fasting insulin [pmol/L]",
       y = "n",
       title = "Fasting insulin count grouped by glucose tolerance",
       fill = "Glucose tolerance: ")+
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        panel.grid.major.x = element_blank(),
        axis.text.x = element_text(vjust = 5))

save_plot_custom(
  plot = fasting_insuling_vs_glucose_tolerance_bar,
  filename = "07_fasting_insuling_vs_glucose_tolerance_bar.jpg"
)
fasting_insuling_vs_glucose_tolerance_bar
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

From literature we know that Insulin remains in the blood longer in type II diabetic patients, this can be seen from the graph above, where in the higher categories only diabetic subjects are present.

fasting_insuling_vs_glucose_tolerance <- dataset |> 
  ggplot(mapping = aes(x = fasting_insulin,
                      y = glucose_tolerance,
                      fill = glucose_tolerance )) +
    geom_density_ridges(alpha = 0.5) +
    labs(x = "Fasting insulin [pmol/L]",
         y = "Glucose tolerance",
         title = "Fasting insulin and glucose tolerance ") +
    theme_minimal(base_family = "Avenir",
                  base_size = 12) +
    theme(legend.position = "none")

save_plot_custom(
  plot = fasting_insuling_vs_glucose_tolerance,
  filename = "07_fasting_insuling_vs_glucose_tolerance_ridges.jpg"
)
Picking joint bandwidth of 2.16
fasting_insuling_vs_glucose_tolerance
Picking joint bandwidth of 2.16
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We also look at how fasting insulin is distributed according to glucose tolerance, and we see that the glucose tolerance for fasting insulin is higher for the type 2 diabetes group.

c_peptide_vs_bmi_glucose_tolerance <- dataset|>
  filter(BMI_class != "Underweight") |> 
  ggplot(aes(x = glucose_tolerance,
             y = `c-peptide`,
             fill = BMI_class)) +
  geom_boxplot(position = position_dodge(
    preserve = "single"), 
    alpha = 0.4) +
  theme_minimal(base_size = 15) +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5)) +
    labs(x = "BMI class",
         y = "c-peptide [ng/mL]",
         title = "c-peptide concentration",
         fill = "Glucose tolerance: ")

save_plot_custom(
  plot = c_peptide_vs_bmi_glucose_tolerance,
  filename = "07_c_peptide_vs_bmi_glucose_tolerance.jpg"
)
c_peptide_vs_bmi_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

The c-peptide is a 31-aa sequence that binds the A and B chain of the proinsuline and is removed to produce insuline. Therefore its levels are used to distinguish between patients producing (type II) or non producing (type I) insulin. The insulin level is higher in type II as the body tries to maintain normal blood sugar levels (reference).

gad_ab_vs_bmi_glucose_tolerance <- dataset|>
  filter(BMI_class != "Underweight") |> 
  filter(`gad-antibodies` < 20) |> 
  ggplot(aes(x = BMI_class,
             y = `gad-antibodies`,
             fill = glucose_tolerance)) +
  geom_boxplot(position = position_dodge(
    preserve = "single"), 
    alpha = 0.4) +
  theme_minimal(base_size = 15) +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5)) +
    labs(x = "BMI class",
       y = "GAD-antibodies [U/mL]",
       title = "GAD-antibodies concentration",
       fill = "Glucose tolerance: ")

save_plot_custom(
  plot = gad_ab_vs_bmi_glucose_tolerance,
  filename = "07_gad_ab_vs_bmi_glucose_tolerance.jpg"
)
gad_ab_vs_bmi_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We know from the literature that GAD antibodies are a parameter used to differentiate between type 1 and type 2 diabetes. In type 1 there is a decrease in insulin production which can be caused by production GAD antibodies that cause the immune system to attack the beta cells of the pancreas that produce insulin. Here we do not see changes in GAD antibodies as our dataset contains samples from type 2 diabetes patients (reference).

dataset|>
  filter(`gad-antibodies` > 20) 
<<<<<<< HEAD
# A tibble: 1 × 377
  sampleID glucose_tolerance is_diseased   age country   bmi BMI_class    hdl
  <chr>    <fct>                   <dbl> <dbl> <chr>   <dbl> <fct>      <dbl>
1 S239     impaired                    1    71 sweden   25.1 Overweight  1.21
# ℹ 369 more variables: ldl <dbl>, hdl_to_ldl_ratio <dbl>,
=======
# A tibble: 1 × 434
  sampleID glucose_tolerance is_diseased   age country   bmi BMI_class    hdl
  <chr>    <fct>                   <dbl> <dbl> <chr>   <dbl> <fct>      <dbl>
1 S239     impaired                    1    71 sweden   25.1 Overweight  1.21
# ℹ 426 more variables: ldl <dbl>, hdl_to_ldl_ratio <dbl>,
>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270
#   `gad-antibodies` <dbl>, whr <dbl>, wc <dbl>, cholesterol <dbl>,
#   triglycerides <dbl>, creatinine <dbl>, `y-gt` <dbl>, fasting_glucose <dbl>,
#   fasting_insulin <dbl>, `HOMA-IR` <dbl>, HOMA_category <fct>, hba1c <dbl>,
#   adiponectin <dbl>, leptin <dbl>, hscrp <dbl>, `c-peptide` <dbl>,
#   tnfa <dbl>, statins <chr>, insulin <chr>,
<<<<<<< HEAD
#   s_Methanobrevibacter_smithii <dbl>, s_Methanosphaera_stadtmanae <dbl>, …
======= # s_Methanobrevibacter_smithii <dbl>, …
>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

This single example shows a concentration of gad antibodies more compatible with type I diabetes.

Additional exploratory plots

linear_wc_hba1c_glucose_tolerance <- dataset |>
  ggplot(aes(x = wc,
             y = hba1c,
             colour = glucose_tolerance)) +
  geom_point(size = 2,
             alpha = 0.4) +
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(x = "WC [cm]",
       y = "HbA1c [mmol/mol]",
       title = "WC vs HbA1c groupt by glucose tolerance",
       colour = "Glucose tolerance: ")+
  theme_minimal(base_size = 12)

save_plot_custom(
  plot = linear_wc_hba1c_glucose_tolerance,
  filename = "07_linear_wc_hba1c_glucose_tolerance.jpg"
)
`geom_smooth()` using formula = 'y ~ x'
linear_wc_hba1c_glucose_tolerance
`geom_smooth()` using formula = 'y ~ x'
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We were then interested in studying the relationship between WC, HbA1c, and glucose tolerance, where we see that the data points for the type 2 diabetes group are quite different from the normal and impaired groups. This is because both hba1c and wc correlate positively with diabetes.

linear_hdl_hba1c_glucose_tolerance <- dataset |>
  ggplot(aes(x = hdl,
             y = hba1c,
             colour = glucose_tolerance)) +
  geom_point(size = 2,
             alpha = 0.5) +
  labs(x = "HDL [mmol/L]",
       y = "Hemoglobin A1C [mmol/mol]",
       title = "HDL vs Hemoglobin A1C groupt by glucose tolerance",
       colour = "Glucose tolerance: ") +
  theme_minimal(base_size = 12)

save_plot_custom(
  plot = linear_wc_hba1c_glucose_tolerance,
  filename = "07_dlinear_wc_hba1c_glucose_tolerancee.jpg"
)
`geom_smooth()` using formula = 'y ~ x'
linear_hdl_hba1c_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

In the figure below, we see a point plot of HDL on the x-axis and Hemoglobin A1C on the y-axis, filtered by glucose tolerance. We observe that Hemoglobin A1C is a better discriminatory factor.

hdl_glucose_tolerance <- dataset |> 
  ggplot(aes(x = glucose_tolerance,
             y = hdl,
             fill = glucose_tolerance)) + 
  geom_boxplot(color = "black") + 
  labs(
    x = "Glucose tolerance",
    y = "HDL [mmol/L]",
    title = "HDL vs glucose tolerance"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, hjust = 0.5)
  )

save_plot_custom(
  plot = hdl_glucose_tolerance,
  filename = "07_hdl_glucose_tolerance.jpg"
)
hdl_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We can see that the HDL levels lower with the glucose tolerance decreasing, this is in agreement with the literature that tells that HDL is lowered in diabetic patients.

ldl_glucose_tolerance <- dataset |> 
  ggplot(aes(x = glucose_tolerance,
             y = ldl,
             fill = glucose_tolerance)) + 
  geom_boxplot(color = "black") + 
  labs(
    x = "Glucose tolerance",
    y = "LDL level [mmol/L]",
    title = "LDL level vs glucose tolerance"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, hjust = 0.5)
  )

save_plot_custom(
  plot = ldl_glucose_tolerance,
  filename = "07_ldl_glucose_tolerance.jpg"
)
ldl_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Strangely we see that same for the LDL, which instead is expected to either lower or be similar across the categories.

hb1ac_glucose_tolerance <- dataset |> 
  ggplot(aes(x = glucose_tolerance,
             y = hba1c,
             fill = glucose_tolerance)) + 
  geom_boxplot(color = "black") + 
  labs(
    x = "Glucose tolerance",
    y = "Hemoglobin A1C [mmol/mol]",
    title = " Hemoglobin A1C vs glucose tolerance"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, hjust = 0.5)
  )

save_plot_custom(
  plot = hb1ac_glucose_tolerance,
  filename = "07_hb1ac_glucose_tolerance.jpg"
)
hb1ac_glucose_tolerance
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

The effect of treatment

First, we are interested in investigating how many people in each glucose tolerance category are on statins treatment. We see that there are 26 people with type 2 diabetes who are on statins. We only consider statins as the treatment, as we know from the table that less than 5% of the samples are on insulin.

Statins are used as preventive treatment against cardiovascular diseases, but have been linked to an increased risk of developing diabetes (reference).

count_statins_graph <- dataset |>
  group_by(glucose_tolerance) |> 
  count(statins) |>
  filter(statins =="y") |> 
  ggplot(aes(x = glucose_tolerance,
             y = n,
             fill = glucose_tolerance,
             label = n)) +
  geom_col(colour = "black",
           alpha = 0.4) +
  theme_minimal(base_size = 15) +
  labs(x = "",
       y = "on statins") +
  geom_hline(yintercept = 0) +
  geom_text(vjust = -0.5, size = 5)+
  ylim(0,45)+
  theme(legend.position = "none",
        axis.text.y = element_blank())

save_plot_custom(
  plot = count_statins_graph,
  filename = "07_statins_amuount.jpg"
)
count_statins_graph
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270
percentage_statins_graph <- dataset |>
  group_by(glucose_tolerance) |> 
  count(statins) |>
  mutate(percentage = n / sum(n) * 100)|>
  filter(statins =="y") |> 
  ggplot(aes(x = glucose_tolerance,
             y = n,
             fill = glucose_tolerance,
             label = str_c(round(percentage,digits = 2),"%"))) +
  geom_col(colour = "black",
           alpha = 0.4) +
  theme_minimal(base_size = 15) +
  labs(x = "",
       y = "% received statins",
       title = "% of patient groups recieve statin treatment") +
  geom_hline(yintercept = 0) +
  geom_text(vjust = -0.5, size = 5)+
  ylim(0,45)+
  theme(legend.position = "none",
        axis.text.y = element_blank())

save_plot_custom(
  plot = percentage_statins_graph,
  filename = "07_percentage_statins_graph.jpg"
)
percentage_statins_graph
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

Now we want to compare within the type 2 diabetes group whether fasting insulin is influenced by being on medication. In the plot below, we see that fasting insulin is not affected by the medication, but the cholesterol level is lower when on medication.

linear_fasting_insuline_cholesterol <- dataset |> 
  filter(glucose_tolerance == "t2d") |> 
  ggplot(aes(x = fasting_insulin,
             y = cholesterol,
             colour = statins)) +
  geom_point(size = 2,
             alpha = 0.4) +
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(x = "Fasting insulin [pmol/L]",
       y = "Cholesterol [mmol/L]",
       title = "Fasting insulin vs cholesterol grouped by 
       statins for type 2 diabetes patients",
       colour = "Recieve statins: ") +
  theme_minimal(base_size = 12)

save_plot_custom(
  plot = linear_fasting_insuline_cholesterol,
  filename = "07_linear_fasting_insuline_cholesterol.jpg"
)
`geom_smooth()` using formula = 'y ~ x'
linear_fasting_insuline_cholesterol
`geom_smooth()` using formula = 'y ~ x'
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We also want to see if the medication has an effect on c-peptide and HDL, where we see that the difference is minor.

linear_c_peptide_hdl <- dataset |> 
  filter(glucose_tolerance == "t2d") |> 
  ggplot(aes(x = `c-peptide`,
             y = hdl,
             colour = statins)) +
  geom_point(size = 2,
             alpha = 0.4) +
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(x = " c-peptide [ng/mL]",
       y = "HDL [mmol/L]",
       title = "c-peptide vs HDL grouped by statins for 
       type 2 diabetes patients",
       colour = "Recieve statin: ")+
  theme_minimal(base_size = 12)

save_plot_custom(
  plot = linear_c_peptide_hdl,
  filename = "07_linear_c_peptide_hdl.jpg"
)
`geom_smooth()` using formula = 'y ~ x'
linear_c_peptide_hdl
`geom_smooth()` using formula = 'y ~ x'
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We can see that the number of obese and overweight people increases in glucose impaired and diabetic categories.

#split on BMI
statins_cholesterol <- dataset |> 
  filter(BMI_class != "Underweight") |> 
  pivot_longer(cols = c(statins),
               names_to = "variable", 
               values_to = "category") |> 
  ggplot(aes(x = category, y = cholesterol, fill = category)) + 
  geom_boxplot(color = "black") +
  facet_wrap(~variable) +
  labs(
    x = "",
    y = "Cholesterol [mmol/L]",
    fill = "Recieve statin"
  ) +
  facet_wrap(~glucose_tolerance) +
  theme_minimal() +
  theme(
  )

save_plot_custom(
  plot = statins_cholesterol,
  filename = "07_statins_cholesterol.jpg"
)
statins_cholesterol
<<<<<<< HEAD

=======

>>>>>>> 699999d22ef3924c75c3e00c6bd1c63a41c81270

We can see that the treatment with statins have the effect of lowering cholesterol, which is coherent with what’s expected.

# # clean environment
# rm(list = ls() |>  
#       keep(~ !is.function(get(.))) |>  
#       discard(~ . %in% ls()))